Lowndes County
My LLM might Mimic AAE -- But When Should it?
Sandoval, Sandra C., Acquaye, Christabel, Cobbina, Kwesi, Teli, Mohammad Nayeem, Daumé, Hal III
We examine the representation of African American English (AAE) in large language models (LLMs), exploring (a) the perceptions Black Americans have of how effective these technologies are at producing authentic AAE, and (b) in what contexts Black Americans find this desirable. Through both a survey of Black Americans ($n=$ 104) and annotation of LLM-produced AAE by Black Americans ($n=$ 228), we find that Black Americans favor choice and autonomy in determining when AAE is appropriate in LLM output. They tend to prefer that LLMs default to communicating in Mainstream U.S. English in formal settings, with greater interest in AAE production in less formal settings. When LLMs were appropriately prompted and provided in context examples, our participants found their outputs to have a level of AAE authenticity on par with transcripts of Black American speech. Select code and data for our project can be found here: https://github.com/smelliecat/AAEMime.git
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > District of Columbia > Washington (0.14)
- North America > United States > Maryland (0.04)
- (16 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties
Masis, Tessa, Neal, Anissa, Green, Lisa, O'Connor, Brendan
The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature's function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
- Asia > India (0.06)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (12 more...)
5 Great Routes for Self-Driving Trucks--When They're Ready
Say you wake up tomorrow morning and there's a robo-truck just sitting in your driveway. Today, no one really has a self-driving truck yet--though plenty are working on it. Even the US Army is in on the act. Their advances--and testing operations in states like Nevada, California, Florida, Arizona, and Georgia--are impressive, but not there yet. Still, the tech should arrive one day, which is why that thought experiment is helpful.
- North America > United States > California (0.54)
- North America > United States > Arizona (0.27)
- North America > United States > Nevada (0.25)
- (9 more...)
- Automobiles & Trucks (1.00)
- Transportation > Passenger (0.91)
- Transportation > Ground > Road (0.91)
- Information Technology > Robotics & Automation (0.91)